Skip to contents

Using a basic example, we walk through the main steps of the methodology to show how it is used.

Preparing the data

To start, install the GeoPressureR package from Github using the following line:

devtools::install_github("Rafnuss/GeoPressureR")

We will be using the following libraries:

For making this vignette faster to compute, we have already run all the code below and we load the result here. (The corresponding R chunk are eval=F)

load(system.file("extdata", "18LX_pressure_maps.rda", package = "GeoPressureR"))
load(system.file("extdata", "18LX_pressure_prob.rda", package = "GeoPressureR"))
load(system.file("extdata", "18LX_pressure_timeserie.rda", package = "GeoPressureR"))
pressure_timeserie_1 <- pressure_timeserie[[1]]

Reading geolocator data

In this example, we use data captured on a Great Reed Warbler Acrocephalus arundinaceus (18LX). Below, we read the geolocator data and crop it so that it starts on the equipment date and ends on the retrieval date.

pam_data <- pam_read(
  pathname = system.file("extdata", package = "GeoPressureR"),
  crop_start = "2017-06-20", crop_end = "2018-05-02"
)

Automatic classification of activity

We use a k-mean clustering to group periods of low and high activity. We then classify high activities lasting more than 30 minutes as migratory activities. See more possible classifications described in the PALMr manual.

pam_data <- pam_classify(pam_data, min_duration = 30)

Editing activity on TRAINSET

To ensure the high level of precision needed for the pressure match, we must manually edit the activity classification and the pressure timeseries to be matched. We suggest doing this with TRAINSET. A separate vignette dedicated to this exercise, including best practices and a sample code to get started, is available at Labelling tracks.

Use trainset_write() to export the automatically generated classifications in a csv file, which can be opened in TRAINSET: https://trainset.geocene.com/.

trainset_write(pam_data, pathname = system.file("extdata", package = "GeoPressureR"))
# browseURL("https://trainset.geocene.com/")

Printscreen of the manual classification in TRAINSET. See the labelling track vignette for more information.

When you have finished the manual editing, export the new csv file (TRAINSET will add -labeled in the name). Make sure to keep this classification file (e.g. under /data/).

To edit an existing file, re-open the file on TRAINSET and read this file directly with trainset_read().

pam_data <- trainset_read(pam_data, pathname = system.file("extdata", package = "GeoPressureR"))

Identifying stationary periods

Based on the activity labeling, pam_sta() creates a table of stationary periods as illustrated below.

pam_data <- pam_sta(pam_data)
knitr::kable(head(pam_data$sta))
start end sta_id
2017-06-20 00:00:00 2017-08-04 19:50:00 1
2017-08-04 23:15:00 2017-08-05 19:30:00 2
2017-08-06 02:50:00 2017-08-06 19:15:00 3
2017-08-07 03:10:00 2017-08-07 19:15:00 4
2017-08-08 00:10:00 2017-08-29 18:40:00 5
2017-08-30 04:30:00 2017-08-30 18:45:00 6

We can visualize the pressure measurements for each grouped stationary period (symbolized by a different color). The back dots represents the pressure labeled as outliar and these datapoint will not be matched.

pressure_na <- pam_data$pressure
pressure_na$obs[pressure_na$isoutliar | pressure_na$sta_id==0]=NA
p <- ggplot() +
  geom_line(data = pam_data$pressure, aes(x = date, y = obs), col = "grey") +
  geom_line(data = pressure_na, aes(x = date, y = obs, col = as.factor(sta_id))) +
  geom_point(data = subset(pam_data$pressure, isoutliar), aes(x = date, y = obs), colour = "black") +
  theme_bw() +
  scale_y_continuous(name = "Pressure(hPa)") +
  scale_colour_manual(values = rep(RColorBrewer::brewer.pal(9, "Set1"), times = 8))

ggplotly(p, dynamicTicks = T) %>%
  layout(
    showlegend = F,
    legend = list(orientation = "h", x = -0.5),
    yaxis = list(title = "Pressure [hPa]")
  )

Computing the map of pressure

Now that we have clean pressure timeseries for each stationary period, we are ready to match each one with atmospheric pressure data (ERA5). To overcome the challenge of computing mismatch on such a large dataset, this R package uses the API GeoPressure to perform the computation on Google Earth Engine.

Initially, it is easier and faster to query only long stationary periods (in the example below, we select only periods longer than 12hrs). You can do so by setting the pressure of the stationary periods you wish to discard to NA.

sta_id_keep <- pam_data$sta$sta_id[difftime(pam_data$sta$end, pam_data$sta$start, units = "hours") > 0]
pam_data$pressure$sta_id[!(pam_data$pressure$sta_id %in% sta_id_keep)] <- NA

We can now query the data on the API with geopressure_map(). A detailed description of the parameters can be found here. This will take a couple of minutes to run.

extent <- c(50, -16, 0, 23) # coordinates of the map to request (N, W, S, E)
scale <- 2 # request on a 1/2=0.5° grid to make the code faster
max_sample <- 250 # limit the query to the first 250 datapoints.
margin <- 30 # roughly equivalent to 3hPa
pressure_maps <- geopressure_map(pam_data$pressure, extent = extent, scale = scale, max_sample = max_sample, margin = margin)
usethis::use_data(pressure_maps, overwrite = T)

geopressure_map() returns a list of two rasters for each stationary periods. The first is the mean square error (\(\textbf{MSE}\)) between the pressure timeseries and ERA5 map. The second (\(\textbf{z}_{thr}\)) is the proportion of datapoints in the pressure timeseries which correspond to an altitude that falls between the min and max altitude of each grid cell. Read more about these values and how they are computed here.

We then combine the two rasters in a single probability map using \[\textbf{P} = \exp \left(-w \frac{\textbf{MSE}}{s} \right) [\textbf{z}_{thr}>thr]\] where \(s\) is the standard deviation of pressure and \(thr\) is the threshold mask. Because the auto-correlation of the timeseries is not accounted for in this equation, we use a log-linear pooling weight \(w=\log(n) - 1\), where \(n\) is the number of datapoints in the timeseries. This operation is described in publication […]. Another vignette describing the influence of log-linear pooling and length of timeseries will be added later.

s <- 1 # standard deviation of pressure
thr <- 0.9 # threshold of the threshold proportion value acceptable
pressure_prob <- geopressure_prob_map(pressure_maps, s = s, thr = thr)
usethis::use_data(pressure_prob, overwrite = T)

We use leaflet() to visualize the threshold mask, mismatch map, and overall probability map for a single stationary period.

i_r <- 2
leaflet(width = "100%") %>%
  addProviderTiles(providers$Stamen.TerrainBackground) %>%  addFullscreenControl() %>% 
  addRasterImage(pressure_prob[[i_r]], opacity = 0.8, colors = "OrRd", group = "Probability") %>%
  addRasterImage(pressure_maps[[i_r]][[1]], opacity = 0.8, colors = "OrRd", group = "Mismatch") %>%
  addRasterImage(pressure_maps[[i_r]][[2]], opacity = 0.8, colors = "OrRd", group = "Threashold") %>%
  # addLegend(pal = pal, values = values(v[[i_s]][[3]]), title = "Probability") %>%
  addLayersControl(
    overlayGroups = c("Probability", "Mismatch", "Threashold"),
    options = layersControlOptions(collapsed = FALSE)
  ) %>%
  hideGroup(c("Mismatch", "Threashold"))

We can also visualize the probability map for all stationary periods:

li_s <- list()
l <- leaflet(width = "100%") %>% addProviderTiles(providers$Stamen.TerrainBackground) %>%  addFullscreenControl()
for (i_r in seq_len(length(pressure_prob))) {
  i_s <- metadata(pressure_prob[[i_r]])$sta_id
  info <- pam_data$sta[pam_data$sta$sta_id == i_s, ]
  info_str <- paste0(i_s, " | ", format(info$start,"%d-%b %H:%M"), "->", format(info$end,"%d-%b %H:%M")) 
  li_s <- append(li_s, info_str)
  l <- l %>% addRasterImage(pressure_prob[[i_r]], opacity = 0.8, colors = "OrRd", group = info_str)
}
l %>%
  addLayersControl(
    overlayGroups = li_s,
    options = layersControlOptions(collapsed = FALSE)
  ) %>%
  hideGroup(tail(li_s, length(li_s) - 1))
raster::animate(brick(pressure_prob), n = 1)

Computing altitude

The second operation you can perform with GeoPressureR is to compute the exact altitude of the bird \(z_{gl}\) from its pressure measurement \(P_{gl}\) and assuming its location \(x\). This function uses ERA5 to adjust the barometric equation, \[ z_{gl}(x)=z_{ERA5}(x) + \frac{T_{ERA5}(x)}{L_b} \left( \frac{P_{gl}}{P_{ERA5}(x)} \right) ^{\frac{RL_b}{g M}-1},\] where \(z_{ERA}\), \(T_{ERA}\) and \(T_{ERA}\) respectively correspond to the ground level elevation, temperature at 2m and ground level pressure of ERA5, \(L_b\) is the standard temperature lapse rate, \(R\) is the universal gas constant, \(g\) is the gravity constant and \(M\) is the molar mass of air. See more information here.

We can compute the bird’s elevation for its first stationary period using the most likely position on the probability map.

i_r <- 1
i_s <- metadata(pressure_prob[[i_r]])$sta_id
tmp <- as.data.frame(pressure_prob[[i_r]], xy = T)
lon <- tmp$x[which.max(tmp[, 3])]
lat <- tmp$y[which.max(tmp[, 3])]

And then call the function geopressure_ts() with the subset of pressure containing sta_id==1

pressure_timeserie_1 <- geopressure_ts(lon, lat, pressure = subset(pam_data$pressure, sta_id == 1))

We can compare the altitude produced to the one computed without the correction for temperature and pressure:

Lb <- -0.0065
R <- 8.31432
g0 <- 9.80665
M <- 0.0289644
T0 <- 273.15 + 15
P0 <- 1013.25
pressure_timeserie_1$altitude_baro <- T0 / Lb * ((pressure_timeserie_1$pressure / P0)^(-R * Lb / g0 / M) - 1)

and visualize this comparison:

p <- ggplot() +
  geom_line(data = as.data.frame(pressure_timeserie_1), aes(x = date, y = altitude, col = as.factor("Corrected elevation with ERA5"))) +
  geom_line(data = as.data.frame(pressure_timeserie_1), aes(x = date, y = altitude_baro, col = as.factor("Uncorrected elevation"))) +
  labs(col = "") +
  theme_bw()

ggplotly(p) %>%
  layout(legend = list(orientation = "h", x = -0.5))

The function geopressure_ts() also returns the ground level pressure timeseries from ERA5 at the location specified. This is useful to check whether there is a good match between the pressure measured by the geolocator and the one at the assumed location. This operation is typically used to check the quality of the manual labelling (see the vignette on how to label tracks).

Computing pressure and altitude of the full path

We can repeat the computation of the pressure timeserie for all stationary periods. First we compute all the most likely position from the probability map of pressure.

path <- geopressure_map2path(pressure_prob)

We need a dirty fix for i_s=5 where the center of grid cell falls into water. This is caused by geopressure_map() which accept any location of the map if there are some land in its pixel. We retrieve the center of the grid cell which might fall on the water.

path$lat[5] <- path$lat[5] + .25

Secondly, we can use geopressure_ts_path() which basically call geopressure_ts() in parallel for all stationary periods.

pressure_timeserie <- geopressure_ts_path(path, pam_data$pressure)
col <- rep(RColorBrewer::brewer.pal(9, "Set1"), times = ceiling((nrow(pam_data$sta)+1)/9))
col <- col[1:(nrow(pam_data$sta)+1)]
names(col) <- levels(factor(c(0,pam_data$sta$sta_id)))

p <- ggplot() +
  geom_line(data = pam_data$pressure, aes(x = date, y = obs), colour = "grey") +
  geom_point(data = subset(pam_data$pressure, isoutliar), aes(x = date, y = obs), colour = "black") +
  # geom_line(data = pressure_na, aes(x = date, y = obs, color = factor(sta_id))) +
  geom_line(data = do.call("rbind", pressure_timeserie), aes(x = date, y = pressure0, col = factor(sta_id)), linetype=2) +
  theme_bw() + scale_colour_manual(values = col) + scale_y_continuous(name = "Pressure(hPa)")

ggplotly(p, dynamicTicks = T) %>% layout(showlegend = F)
sta_duration <- unlist(lapply(pressure_prob,function(x){as.numeric(difftime(metadata(x)$temporal_extent[2],metadata(x)$temporal_extent[1],units="days"))}))
pal <- colorFactor(col, as.factor(seq_len(length(col))))
leaflet() %>% addProviderTiles(providers$Stamen.TerrainBackground) %>%  addFullscreenControl() %>%
  addPolylines(lng = path$lon, lat = path$lat, opacity = 0.7, weight = 1, color = "#808080") %>%
  addCircles(lng = path$lon, lat = path$lat, opacity = 1, color = pal(factor(path$sta_id, levels = pam_data$sta$sta_id)), weight = sta_duration^(0.3)*10)